Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spruce up the way we check for original input files. #314

Merged
merged 3 commits into from
May 22, 2024

Conversation

delucchi-cmu
Copy link
Contributor

Change Description

Closes #299

Solution Description

Converts paths to strings before performing comparison, so plain-text strings and Pathlib objects can be compared reasonably. Also, converts both sides into lists for comparisons.

Code Quality

  • I have read the Contribution Guide and LINCC Frameworks Code of Conduct
  • My code follows the code style of this project
  • My code builds (or compiles) cleanly without any errors or warnings
  • My code contains relevant comments and necessary documentation

Bug Fix Checklist

  • My fix includes a new test that breaks as a result of the bug (if possible)
  • My change includes a breaking change
    • My change includes backwards compatibility and deprecation warnings (if possible)

Copy link

codecov bot commented May 17, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.52%. Comparing base (8b19bc2) to head (2b1dd4d).
Report is 49 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #314   +/-   ##
=======================================
  Coverage   99.52%   99.52%           
=======================================
  Files          25       25           
  Lines        1270     1270           
=======================================
  Hits         1264     1264           
  Misses          6        6           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@camposandro
Copy link
Collaborator

Looks good to me!

I just have a small question, should we be writing the set of unique paths, unique_file_paths, instead of input_paths to disk? It doesn't really change the behavior in this case since we create a set on read but it might improve readability.

https://github.com/astronomy-commons/hipscat-import/blob/48dc9acc3765954af0a3a46bbbc7b6f0650370c0/src/hipscat_import/pipeline_resume_plan.py#L190-L193

@delucchi-cmu
Copy link
Contributor Author

Looks good to me!

I just have a small question, should we be writing the set of unique paths, unique_file_paths, instead of input_paths to disk? It doesn't really change the behavior in this case since we create a set on read but it might improve readability.

https://github.com/astronomy-commons/hipscat-import/blob/48dc9acc3765954af0a3a46bbbc7b6f0650370c0/src/hipscat_import/pipeline_resume_plan.py#L190-L193

I think it would change the behavior, but only in that it would be more correct. Since we're sorting and de-duping before we check, we should use the results of those operations in the write-to-disk.

@delucchi-cmu delucchi-cmu merged commit 37ce248 into main May 22, 2024
9 checks passed
@delucchi-cmu delucchi-cmu deleted the issue/299/pathlib branch May 22, 2024 17:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pipeline resume should consider Pathlib path for determining fileset
2 participants